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We consider the task of detecting a salient cluster in a sensor network, that is, an undirected 
graph with a random variable attached to each node. Motivated by recent research in environ- 
mental statistics and the drive to compete with the reigning scan statistic, we explore alternatives 
based on the percolative properties of the network. The first method is based on the size of the 
largest connected component after removing the nodes in the network with a value below a 
given threshold. The second method is the upper level set scan test introduced by Patil and 
Taillie [Statist. Sci. 18 (2003) 457-465]. We establish the performance of these methods in an 
asymptotic decision- theoretic framework in which the network size increases. These tests have 
two advantages over the more conventional scan statistic: they do not require previous informa- 
tion about cluster shape, and they are computationally more feasible. We make abundant use 
of percolation theory to derive our theoretical results, and complement our theory with some 
numerical experiments. 
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hypothesis testing; percolation; scan statistic; surveillance; upper level set scan statistic 

1. Introduction 

We consider the problem of cluster detection in a network. The network is modeled as a 
graph, and we assume that a random variable is observed at each node. The task is then 
to detect a cluster, that is, a connected subset of nodes with values that are larger than 
usual. There are a multitude of applications for which this model is relevant; examples 
include detection of hazardous materials (Hills [25]) and target tracking (Li et al. [35]) in 
sensor networks (Culler, Estrin and Srivastava [12]), and detection of disease outbreaks 
(HefFernan et al. [24]; Rotz and Hughes [49]; Wagner et al. [53]). Pixels in digital images 
are also sensors, and thus many other applications are found in the rich literature on 
image processing, for example, road tracking (Geman and Jedynak [20]) and fire prcven- 
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tion using satellite imagery (Pozo, Olmo and Alados-Arboledas [47]), and the detection 
of cancerous tumors in medical imaging (Mclnerney and Terzopoulos [36]). 

After specifying a distributional model for the observations at the nodes and a class 
of clusters to be detected, the generalized likelihood ratio (GLR) test is the first method 
that comes to mind. Indeed, this is by far the most popular method in practice, and 
as such, is given different names in different fields. The likelihood ratio is known as the 
scan statistic in spatial statistics (Kulldorff [29, 30]) and the corresponding test as the 
method of matched filters in engineering (Jain, Zhong and Dubuisson- Jolly [27] ; Mcln- 
erney and Terzopoulos [36]). Here we use the former, where scanning a given cluster K 
means computing the likelihood ratio for the simple alternative where K is the anoma- 
lous cluster. Various forms of scan statistic have been proposed, differing mainly by the 
assumptions made on the shape of the clusters. Most methods assume that the clusters 
are in some parametric family (e.g., circular (Kulldorff and Nagarwalla [33]), elliptical 
(Hobolth, Pcdersen and Jensen [26]; Kulldorff et al. [32])) or, more generally, deformable 
templates (Jain, Zhong and Dubuisson- Jolly [27]). Sometimes no explicit shape is as- 
sumed, leading to nonparametric models (Duczmal and Assungao [16]; Kulldorff, Fang 
and Walsh [31]; Tango and Takahashi [51]). 

We consider two alternative nonparametric methods, both based on the percolative 
properties of the network, that is, based on the connected components of the graph 
after removing the nodes with values below a given threshold. The simplest is based 
on the size of the largest connected component after thresholding - the threshold is 
the only parameter of this method. If the graph is a one-dimensional lattice, then after 
thresholding, this corresponds to the test based on the longest run (Balakrishnan and 
Koutras [4]), which Chen and Huo [9] adapt for path detection in a thin band. This 
test has been studied in a similar context in a series of papers^ (Davies, Langovoy and 
Wittich [14]; Langovoy and Wittich [34]) under the name of maximum cluster test. The 
idea behind this method is simple. When an anomalous cluster is indeed present, the 
values at the nodes belonging to this cluster are larger than usual and thus more likely to 
survive the threshold, and because these nodes are also likely to clump together - because 
the cluster is connected in the graph - the size of the statistic will be (stochastically) 
larger than when no anomalous cluster is present. 

More sophisticated, and also parameter-free, is the method based on the upper level set 
scan statistic of Patil and Taillie [41] , subsequently developed in the context of ecological 
and environmental applications (Patil, Joshi and Koli [38] ; Patil and Taillie [42] ; Patil et 
al. [37, 39]). It is the result of scanning over the connected components of the graph after 
thresholding, which is repeated at all thresholds. This method obviously is closely related 
to the scan statistic. It can be seen as attempting to approximate the scan statistic over 
all possible connected components of the graph by restricting the class of subsets to 
be scanned to those surviving a threshold. Our results indicate that this method is in 
fact more closely related to the previous one (based on the size of the largest connected 

^Thc authors were not aware of this unpubUshcd line of work until M. Langovoy contacted them in 
the final stages manuscript preparation. 
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component at a given threshold), and in some sense provides a way to automatically 
choose the threshold. 

These two percolation-based methods have two significant advantages over the scan 
statistic. First, they do not need to be provided with the shape of the clusters to be 
detected. Thus they are valuable in settings with less previous spatial information. The 
second advantage is computational. The scan statistic tends to be computationally de- 
manding, even in parametric settings, or even outright intractable, particularly in non- 
parametric settings. In contrast, these two methods are computationally feasible, and 
their implementation is fairly straightforward, even for irregular networks. On the other 
hand, the scan statistic often relies on the fast Fourier transform in the square lattice to 
scan clusters of known shape over all locations in that network. 

In terms of detection performance, we compare these percolation-based methods to the 
scan statistic in a standard asymptotic decision theoretic framework where the network is 
a square lattice of growing size and the variables at the nodes are assumed i.i.d. for nodes 
inside (resp., outside) the anomalous cluster. The performance of the scan statistic in 
such a framework is well understood and known to be (near-) optimal, which makes it the 
gold standard in detection (Arias-Castro, Candes and Durand [1]; Arias-Castro, Donoho 
and Huo [3]; Peronc Pacifico et al. [45]; Walthcr [54]). We find that these two methods 
are suboptimal for the detection of hypercubes, an emblematic parametric class, but 
are near-optimal for the detection of self-avoiding paths, an emblematic nonparametric 
class. The main weakness of these percolation-based methods is that when the per-node 
signal-to-noise ratio is weak, the connected components after thresholding are heavily 
infiuenced by the whimsical behavior of the values at the nodes. The scan statistic is 
very effective in such situations. Although this rationale seems to apply particularly well 
in the case of self-avoiding paths, what makes these methods competitive in this case is 
that the problem of detecting such objects is intrinsically very hard. 

The study of the connected components after thresholding is intrinsically connected to 
percolation theory (Grimmett [21]), an important branch in probability theory. In fact, 
when the node values are i.i.d. - which is the case when no anomalous cluster is present 
- the only dependence on the distribution at the nodes is the probability of surviving the 
threshold, and after thresholding, the network is a site percolation model. (We introduce 
and discuss these notions in detail later in the article.) Our contribution is a careful 
analysis of these two nonparametric methods using percolation theory (Grimmett [21]) 
in a substantial way, thus applying percolation theory in a sophisticated fashion to shed 
light on an important problem in statistics. 

The rest of the paper is organized as follows. In Section 2 we formally introduce the 
framework and state some fundamental detection bounds. In Section 3 we describe the 
standard scan statistic and present some results on its performance, showing that it is 
essentially optimal. In Section 4, we consider the size of the largest connected component 
after thresholding. In Section 5, we consider the upper level set scan statistic. We briefly 
discuss implementation issues and present some numerical experiments in Section 6. 
Finally, Section 7 is a discussion section where, in particular, we mention extensions. We 
provide proofs in the Appendix. 
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2. Mathematical framework and fundamental 
detection bounds 

For concreteness, and also for its relevance to signal and image processing, we model 
the network as a finite subgrid of the regular square lattice in dimension d, denoted 
Vm := {1, • ■ • ) rii}'^ . Our analysis is asymptotic in the sense that the network is assumed 
to be large, that is, to — > od. To each node v £ Vm, we attach a random variable, X^,. For 
example, in the context of a sensor network, the nodes represent the sensors and the vari- 
ables represent the information that they transmit. The random variables v £ Vm} 
are assumed to be independent with common distribution in a certain one-parameter 
exponential family {Fg: 9 G [O,0oo)}, defined as follows. Let 6*00 > 0, let Fq be a distribu- 
tion function with finite non-zero variance ctq, and assume the that moment-generating 
function (f{9) := / e^^ dFo{x) is finite for 9 £ [0, 0oo)- Then Fg is the distribution function 
with density fg{x) = exp{9x — log(p{9)) with respect to Fq. We assume further regularity 
of Fq at later points in this paper. Note that our results apply to other distributional 
models as well, as discussed in Section 7. 

Examples of such a family {Fg: 9 E [O,0oo)} include the following: 

• Bernoulli model: Fg — Ber(pe), pg := \og\t^^{9 + 9q)^ relevant in sensor arrays where 
each sensor transmits one bit (i.e., makes a binary decision) 

• Poisson model: Fg = Poi{9 + 9o)^ popular with count data, for example, arising in 
infectious disease surveillance systems 

• Exponential model: Fg — Exp(0o ~ ^) (e.g., to model response times) 

• Normal location model: Fg =J\f{9 + 9q, 1), standard in signal and image processing, 
where noise is often assumed to be Gaussian. 

Let ICm be a class of clusters, with a cluster defined as a subset of nodes connected in 
the graph. Under the null hypothesis, all of the variables at the nodes have distribution 
Fq, that is. 

Under the particular alternative where K £ ICm is anomalous, the variables indexed by 
K have distribution Fg^ for some 9m > 0, that is. 

We are interested in the situation where the anomalous cluster K is unknown, namely 
in testing H™ against := [jKeK,„^T,K- illustrate the setting in Figure 1 in the 
context of the two-dimensional square grid. 

Let ICm denote a cluster class for Vm- As usual, a test T is a function of the data, 
T = T{Xy: V £ V,„), that takes values in {0, 1}, with T =1 corresponding to a rejection, 
meaning a decision in favor of H™. For a test T, we define its worst-case risk as the 
sum of its probability of type I error and its probability of type II maximized over the 
anomalous clusters in the class 

7™(T) = P(r = 1|H™) + max P(r = 0|H™^). 
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Figure 1. This figure illustrates the setting in dimension d — 2 for a beta model where 
Fo = Unif(0, 1) and Fo = Beta(6'+ 1, 1), 6* > 0. (Left) An instance of the null hypothesis. (Middle) 
An instance of an alternative with a square cluster. (Right) An instance of an alternative with 
a path. 

A method is formally defined as a sequence of tests (T„i) for testing H™ versus H'". We 
say that a method (Tm) is asymptotically powerless if 

liminf7„(T„) > 1. 

This amounts to saying that as the size of the network increases, the method (r„i) is not 
substantially better than random guessing. Conversely, a method (Tm) is asymptotically 
powerful if 

lim 7m (T™) = 0. 

m— f oo 

The minimax risk is defined as infT7,„(r), and we say that a method is (Tm) 

(asymptotically) optimal if jm{Tm) whenever -f^ — > 0. Everything else fixed, the 
latter depends on the behavior of 9m when m becomes large. We say that (T„i) is optimal 
up to a multiplicative constant C > 1 if jm{Tm) — >■ under C6m whenever 7*^ — > under 
9r,i. We say that (T^) is near-optimal if the same is true with C replaced by Cm — > 00 
with log Cm = o{\og9m)- (This occurs here only when 9m — > polynomially fast and 
Cm ^ 00 poly-logarithmically fast.) 

We focus on situations where the clusters in the class ICm are of same size, increasing 
with 771 but negligible compared with the size of the entire network. We do so for the 
sake of simplicity; more general results could be obtained as in Arias-Castro, Candes and 
Durand [1], Arias-Castro, Donoho and Huo [3], Perone Pacifico et al. [45], Walther [54] 
without additional difficulty. Assuming a large anomalous cluster allows us to state gen- 
eral results applying to a wide range of one-parameter exponential families (via the central 
limit theorem). In addition, note that on the one hand, reliably detecting a cluster of 
bounded size is impossible in the Bernoulli model or any other model where Fq has finite 
support, whereas on the other hand, detecting a cluster of size comparable to that of the 
entire network is in some sense trivial, given that the simple test based on the total sum 
^^gy is optimal up to a multiplicative constant. 
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We consider two emblematic classes of clusters, in some sense at the opposite extremes: 

• Hypercube detection. Let ICm denote the class of hypercubes within of sidclength 
[to"] with < a < 1. This class is parametric, with the location of the hypercube 
the only parameter. 

• Path detection. Let /C,„ denote the class of loopless paths within V„i of length [m"] 
with < a < 1. This class is nonparametric, in the sense that its cardinality is 
exponential in the length of the paths. 

See Figure 1 for an illustration. (Note that a hypercube of side length k may be seen 
as a loopless path of length k'^.) Although we obtain results for both, our main focus is 
in the setting of hypercube detection, which is relevant to a wider range of applications, 
in fact any situation where the task is to detect a shape that is not filamentary. The 
situation exemplified in the setting of path detection may be relevant in target tracking 
from video, or the detection of cracks in materials in non-destructive testing. Note that 
the two settings coincide in dimension one. 

We state fundamental detection bounds for each setting. The following result is stan- 
dard (see, e.g., Arias-Castro, Candes and Durand [1]; Arias-Castro, Donoho and Huo [3]). 
Remember that CTq denotes the variance of Fo . 

Lemma 1. In hypercube detection, all methods are asymptotically powerless if 



In fact, the conclusions of Lemma 1 apply for a wide variety of parametric classes, such 
as discs, a popular model in disease outbreak detection (Kulldorff and Nagarwalla [33]), 
as well as to nonparametric classes of blob-like clusters (see Arias-Castro, Candes and 
Durand [1]; Arias-Castro, Donoho and Huo [3]). 

The following result is taken from Arias-Castro et al. [2]. 

Lemma 2. In path detection, all methods are asymptotically powerless if Imim^oo 9m. X 
(logTO)(loglogm)^/^ = 0, in dimension d = 2, and the same is true in dimension d>3 if 
limsup„_^oQ 0m < 0,, where 0, > depends only on d. 

In dimension d > 4, 0* may be taken to be the unique solution to 



limsup(logTO)-i/2m^"/26'm < ao^/2d{l-a). 



where p is the return probability of a symmetric random walk in dimension d. 



3. The scan statistic 



For a subset of nodes K C V, let \K\ denote its size and define 




Cluster detection in networks using percolation 



7 



Given a cluster class JC, we define the (simple) scan statistic as 

max^(XK-A^o), (1) 

where fj,o is the mean of Fq. If fiQ is not available, we may use the grand mean Xv„ 
instead. In Appendix B, we derive this form of the scan statistic as an approximation 
to the scan statistic of KuUdorfF [29], which is, strictly speaking, the GLR and arguably 
the most popular version, particularly in spatial statistics. We use this simpler form to 
streamline our theoretical analysis. 

The test that rejects for large values of the scan statistic (1), which we call the scan 
test, is near-optimal in a wide range of settings (Arias-Castro, Candes and Durand [1]; 
Arias-Castro, Donoho and Huo [3]; Walther [54]). In particular, in the context of a class 
of hypercubes, and in fact many other parametric classes, this test is asymptotically 
optimal to the exact multiplicative constant. 

Lemma 3. In hypercube detection, the scan test is asymptotically powerful if 
lim inf(logm)~ > (To \/2d(l — a). 

m— f oo 

In the context of a class of paths, the following result states that the scan test detects 
if 9m is bounded away from and sufficiently large. Note that this does not match the 
order of magnitude of the lower bound given in dimension d = 2. Let A{6) = \og(p{9) and 
A*(2;) = supg>o[0x — A(0)]. (A* is the rate function of Fq when x > i^lq.) The following 
result is established in Arias-Castro et al. [2]. 

Lemma 4. In path detection, the scan test is asymptotically powerful if 
liminf 61™ > 61, := (A* o A')"^ (log(2d)). 

m—^oo 



4. Size of the largest open cluster 

We study the test based on the size of the largest connected component after thresholding 
the values at the nodes. This test was independently considered in a series of papers 
(Davies, Langovoy and Wittich [14]; Langovoy and Wittich [34]). Our results are seen to 
sharpen and elaborate on these results. In particular, we study this test under all three 
regimes (subcritical, supercritical, and critical). 

Adapting terminology from percolation theory (Grimmett [21]), for a threshold i e M, 
we say that a subset X C V is open (at threshold t) if A„ > t for all v £ K. Let S',„(i) 
(resp., Sxit)) denote the size of the largest open cluster within V™ (resp., within K). The 
analysis of the test based on Sm{t), which we call the largest open cluster (LOG) test, boils 
down to bounding the size of Sm{t) from above, under H™, and, because Sm{t) > Sxit), 
bounding the size of Sxit) from below, under H™^. Define ^v{t) = l{Xy > t}, which 
is Bernoulli with parameter P0{t) := Pg{Xy > t). The process {^v{t): v G Vm) is a site 
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percolation model (Grimmett [21]). In general, consider a process v G V„i) i.i.d. 
Bernoulli with parameter p, and let Sm denote the size of the largest open cluster within 
Vm. In dimension d= 1, this process may be seen as a sequence of coin tosses, and Sm 
viewed as the longest heads run in that sequence. In this context, the Erdos-Rcnyi Law 
(Erdos and Rcnyi [17]) says that 

-—^ - — almost surely. (2) 

logm log(l/p)' ^ ' 

In higher dimensions d>2, the situation is much more involved. Let pc denote the critical 
probability for site percolation in Z'', defined as the supremum over all p € (Oj 1) such that 
the size of the open cluster at the origin, denoted by 5, is finite with probability 1. (The 
dependency in d is left imphcit.) We consider the subcritical {po{t) <Pc), supercritical 
(po(i) >Pc), and near-critical {po{t) ~Pc) cases separately. 



4.1. Subcritical percolation 

In the subcritical case, where t is such that po{t) < Pc, we are able to obtain precise, 
rigorous results on the performance of the test based on Sm{t) in terms of the function 
Cp, implicitly defined as 

Cp := - lim I \og¥{S > fc) = - lim ^ logP(5' = k) (3) 

(see Grimmett [21], Section 6.3). Again, the dependency in d is left implicit. As a function 
of p G (0,Pc), Cp is continuous and strictly decreasing, with limits c» at p = and at 
p^Pc (see Lemma A.l), whereas Cp = for P ^ Pc- In the Appendix, we include a proof 
that 

> in probability (4) 

log TO Cp 

for a subcritical threshold p < pc- 

The convergence result in (4) may be used to bound Sr,i{t) under the null by taking 
p = Po{t). Under the alternative, if we consider a class of hypcrcubcs, then (4) also may 
be used to bound Sxit), because i^T is a scaled version of 

Theorem 1. In hypercube detection, the test based on Smit), with t fixed such that 
< Po(i) < Pc, is asymptotically powerful if liminf„i_i.oo 9m > ^^{t), and asymptotically 
powerless if limsupm-^oo ^rn < (^*{t), where 6^,{t) is the unique solution to Cpgit) = Q^Cpo(t) ■ 

Note that when t is fixed, Cp8(t) as a function of 9 is continuous and strictly strictly 
decreasing, by the fact that pe{t) is continuous and strictly increasing in 9 (Brown [7], 
Cor. 2.6, 2.22) and Cp is continuous and strictly decreasing in p (Lemma A.l). Therefore, 
9t {t) in the theorem is well defined. 
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If instead, we consider a class of paths, then (2) may be used to bound SK{t), because 
K is a scaled version of the lattice in dimension 1. In congruence with (2), we define 
Q=\og{l/p). 

Theorem 2. In path detection, the test based on Smit), with t fixed such that < po(t) < 
Pc, is asymptotically powerful if limmfm-^oo &m > (^t{^)j '^'^'^ asymptotically powerless if 
limsup„j_j.QQ 0„i < 6^{t), where O'^it) (resp., 9^{t)) is the unique solution to dCp^^^^ = 

aCpoit) (resp., dC^^^^^^ = aCp^^t))- 

Note that in dimension d>2, the result is not sharp, because we always have 6f{t) > 
d~ {t). We believe that sharper forms of this result may be substantially more involved, 
and for this reason we have not pursued this. 

Qualitatively, the message is that for both hypercubc detection and path detection, the 
subcritical LOG test requires that 9m be larger than a constant to be effective. Compared 
with the scan statistic, this makes it grossly suboptimal when detecting hypercubcs and 
comparable (up to a multiplicative constant in 6m) when detecting self-avoiding paths. 

What if we let t = — > oo, so that po{tm) — ^ 0? Then the test based on Sm{tm) is 
powerless under some additional conditions on Fq. For b,C> 0, consider the following 
class of approximately exponential power (AEP) distributions, sometimes called Subbotin 
distributions: 

AEP(6,C) = {F: x-''\ogF{x)^-C,x^oo}. 

{F{x) := 1 — F{x) is the survival distribution function oi X ^ F.) For example, Exp(A) G 
AEP(1, A) and 7V(/x,ct^) € AEP(2, 1/(20-2)), whereas Poi(A) behaves roughly as a distri- 
bution in AEP(1,C). 

Proposition 1. Assume that Fq G AEP(6, C) for some b> 1 and C > 0. In hypercubc 
detection, the test based on Smit) is asymptotically powerless when t — tm^oo, unless 
0m oo. 



4.2. Supercritical percolation 

Here we consider the supercritical regime, where po{t) > Pc- (Note that necessarily d>2 
for Pc = 1 in dimension 1.) In this setting, too, the size of the largest cluster is well 
understood. Let Qp be the probability that the open cluster at the origin is infinite, and 
note that Qp> ioi p> pc, by the definition of pc- We have with probability 1 that 



^" e 



p 



(see Falconer and Grimmett [18], Lemma 2 and proof, Penrose and Pisztora [44], Theo- 
rem 4, Pisztora [46]). In fact (with probability 1 — o(l)), the largest open cluster within 
Ym is unique, and the foregoing statement says that it occupies a non-negligible fraction 
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of Vm. With a supercritical choice of threshold, the LOG test is powerless for any 9 if 
the anomalous cluster is too small, specifically if a < 1/2 in the setting of hypercube 
detection. Indeed, we have the following result. 

Theorem 3. In hypercube detection, the test based on Sm(t), with t fixed such that 
Pc <Po{t) < 1; is asymptotically powerful if a> 1/2 and limm^oo 9„im'-"^^/^'>'^ = oo, and 
asymptotically powerless if a < 1/2 or if lim„,-!.oo Omm^°'~^^'^^'^ = 0. 

Thus, for the detection of small clusters, a supercritical LOG test is potentially worth- 
less, whereas for larger clusters it improves substantially on the performance of a subcrit- 
ical LOG test, although it is still suboptimal compared with the scan statistic. (Indeed, 
comparing the exponents when a > 1/2, we have (a — l/2)d < ad/2, because a < 1.) We 
mention that in the context of path detection, the same arguments show that the LOG 
test for any choice of supercritical threshold is asymptotically powerless. 



4.3. Critical percolation 

If our goal is to choose a threshold t so as to maximize the difference in size for the 
largest open cluster under the null and under an alternative, then we are necessarily in 
the neighborhood of the percolation phase transition, which is to say that |p — Pc| is 
small. (Again, here we assume d > 2.) The percolation model is not fully understood in 
the critical regime, which poses a serious obstacle to a rigorous statistical analysis. (See 
Grimmett [21], Ghapter 9, for a general discussion of this percolation regime.) We base 
our discussion on the work of Borgs et al. [6]. Let TTmip) denote the probability that 
the open cluster at the origin reaches outside the box [~m,m]'^, and let ^(p) denote the 
correlation length, defined as 

lim — log7r„(p). 

Note that, with ^ thus defined, ^(p) < oo if and only ii p < pc. The critical exponent for 
(subcritical) correlation length is postulated to be 

■■- lim 



pypa \og\p - pc\' 

It is not known whether the limit exists for all dimensions, but it is known that < < oo 
whenever it exists. It is shown in Borgs et al. [6] that, subject to the existence of this 
limit together with other scaling assumptions, when p = pm varies with m, 

Q ^ jlogm., if, for some J^' > i^, TO^/'^ (pm — Pc) ^ ~oo, ,r\ 
1^ m , II, tor some > v,m ' [pm — Pc) ^ oo, 

where X„i >;p Ym means that there exists a constant C £ (0,oo) such that < 
Xm/Y„i < C in probability. The scaling assumptions of Borgs et al. [6] are believed 
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to hold if and only if the number d of dimensions satisfies 2 < d < 6, and they are proved 
for d = 2. The work of Borgs et al. [6] was directed at bond percolation only, but similar 
results are expected for site percolation. 

It is known that z/ = 4/3 for site percolation on the triangular lattice (see Smirnov and 
Werner [50]), and it is believed that this holds for percolation on any two-dimensional 
lattice. As described in Grimmett [21], Section 10.4, it is believed that = 1/2 for d > 6, 
and this has been proved for d>19 and for the so-called "spread-out model" in 7 and 
more dimensions (Kara, van der Hofstad and Slade [23]). 

Subject to the assumption that (5) holds, we establish the power of the test based on 
Sm{t) when choosing t = tm near criticality. We assume that there exists tc such that 
Po{tc) =Pc, and that po(0 i^ a, continuous function of t in a neighborhood of tc- 

Theorem 4. Let > tc he such that pc—pa{t„i) x m^^^'^ for some v' > v. In hypercube 
detection, assuming that (5) holds, the test based on Smitm) is asymptotically powerful 
«/ liminfm^oo ^m'Tt"/" is sufficiently large. 

Compared with a subcritical choice of threshold, which requires that 6m be bounded 
away from for the test to have any power, as seen in Theorem 1, with a near-critical 
choice of threshold, the test is able to detect at polynomially small 9m- In particular, 
with a proper choice of threshold, the test is powerful for 6m of order m~°^/^ with 
v' > V. Note that, by Lemma 1. all methods are asymptotically powerless if 6m is of 
order m^''"/^, implying that a/v < da/2. We thus obtain the inequality i'>2/d. This 
may be compared with the scaling relation (Grimmett [21], Equation (9.23)) stating that 
dv = 2 — a, where a (< 0) is the percolation critical exponent for the number of clusters 
per vertex. It is believed that a ~ — | when d ~ 2 and a = —1 when d > 6. Compared 
with the performance at supercriticality, the test at near-criticality (with a proper choice 
of threshold) is superior if (a — ^)d < a/v, which is equivalent to a < (1 — a/2)/(l — a). 
For example, with a = — |, the near-critical LOG test is superior when a < |. 

5. The upper level set scan statistic 

For a threshold t, let Qm denote the (random) class of clusters within Vm open at t, 
and let Qm = Ut Qm , which is also random. Patil and Taillie [41] suggested scanning 
the clusters in Q^. To facilitate a rigorous mathematical analysis of its performance, we 
consider the upper level set (ULS) scan at a given threshold t, and use the simple scan 
described in Section 3. Specifically, in correspondence with (1), we define the (simple) 
ULS scan statistic at threshold t as 

Um.it, km) = max{^iq(XK - /ioit): K e QI^,K\K\ > km}, (6) 

where /Xo|t (resp., cTqi^) is the the mean (resp., variance) of X„]X„ > t when Xy ~ Fq, and 
(km) is a non-decreasing sequence of positive integers. The ULS scan statistic of Patil 
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and Taillie [41] corresponds (in its simple form) to 

ULS^ = max ^"^^'^\ (7) 
tm (To|t 

If fio\t and/or a'^^^ are not available, we may use their empirical versions based on the 
Xy that survive the threshold t. We restrict the scan to clusters of size at least fc„i to 
increase power, because the behavior of Um{t) is, as we show later, completely driven by 
the smallest open clusters that are scanned, at least when t is subcritical. We present 
the rest of our discussion in terms of subcritical, supercritical, and near-critical choices 
of threshold. We then conclude with a result on the performance of the ULS scan test 
across all thresholds. 



5.1. Subcritical threshold 

We start by describing the behavior of C/,„(t,fcm) under the null. Let Fg\i denote the 
distribution of > t under Fg, and let ^g\t Ag|j denote its mean and rate 

function, respectively. Also, when < /3 < 1/C,pg[t)^ or /3 = and Fq e AEP(fe, C) for some 
b>2 and C > 0, let 76(|t(/3) := 7(-p6(j(,/io|t,Cpe(t)7/?)j where 7 is the function defined in 
Lemma A. 9. Note that ^e\t{P) can be computed explicitly in some cases, like the normal 
location model, and 7e|t(/3) ~ {^J'0\t — ^^o\t)'^ / CpaW when 9 9c{t), defined (when it exists) 
as the solution to pg{t) ~ Pc- 

Lemma 5. Assume that 9 > and t is fixed such that < peit) < Pc and that 
fcm/logm — >■ dp for some /3 > 0. Then, under Fg on V™, the following holds in prob- 
ability: 

1. If l3 > 1/Cpe(t); then Umit,km) — for m large enough. 

2. //0</3<l/Cp,(t), then 

(logm)-i/2[/,„(t,fc,„) ^ (d7(,|t(/3))i/2. 

3. If 13 = and Fq e AEP(6, C) for some b>l andC>0, then 

(a) If b>2, the convergence in Part 2 applies; 

(b) Ifb<2, 

kU''-'^\\ogm)-'^'Umit,km) ^ {d/Cy/\ 

In the last case, where /? = 0, the behavior of Um{t) is influenced by the very large 
deviations of Fgf^ for k> km- (The symbol * denotes convolution.) We choose to state 
a result for AEP distributions, for which the very large deviations resemble the large 
deviations. 

Based on Lemma 5, we establish the performance of the ULS scan statistic. We start 
by arguing that choosing km such that km/ logm — ?> leads to a test that may potentially 
have less power than the test based on the largest cluster after thresholding. Indeed, the 
behavior of the ULS scan statistic does not depend on 9 as long as 6* < 9c{t). 
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Proposition 2. Assume that Fq G AEP(6, C) for some b <E (1,2) and C > 0. In hyper- 
cube detection, the test based on Um(t,km), with t fixed such that < pQ{t) < Pc and 
fc„i/logm->0, is asymptotically powerless i/ limsup^_^oQ 0„i <0c{t). 

For example, in the setting just described with d = 1, the ULS scan test has (asymp- 
totically) no power unless 9m ^ oo, whereas the test based on the size of the largest 
cluster after thresholding is, by Theorem 1, asymptotically powerful if liminfm^oo 9^ is 
large enough. We therefore choose a sequence km comparable in magnitude to log m and 
state the performance of the ULS scan test in this case. 

Theorem 5. In hypercube detection, the test based on Um{t,km), with t fixed such that 
< po{t) < Pc and fcm/logm — >■ dP with < /3 < l/Cpo(t)) is asymptotically powerful if 
liminfm^oo ^'m > 9^{t) and asymptotically powerless if limsup^^^^ < 9^{t), where 
9^{t) is the unique solution to aje\tiP) =7o|t(/3)- 

Note that 0*(t) is well defined by Lemma A. 10 and that 6'*(t) < 9c as long as a > 0. 
In any case, the test based on Um{t, km) with a subcritical threshold t is, in the setting 
of hypercube detection, asymptotically powerless when 9m — > 0, just like the LOG test. 
In essence, the two tests are qualitatively comparable in this setting. This is also true in 
the context of path detection. Let 7g|t(/5) denote 7e|t(/3) in dimension 1. 

Theorem 6. In path detection, the test based on Umit,km), with t fixed such that 
< poit) < Pc o.nd fcm/logm — > dp with < /3 < l/Cpo(t)j asymptotically powerful if 
lim inf m_>.oo 6*™ > (^ti^)^ o,nd asymptotically powerless if Hmsnp^i^^^ 9m < 9^(t), where 
9f{t) (resp., 9^{t) ) is the unique solution to a7g|j(/3) = 7o|t(/3) (resp., a7e|t(/3) = 7o|t(/3) ). 

As in Theorem 2, the result is not as sharp. 

Qualitatively, we see that the performance of the subcritical ULS scan and LOG tests 
are comparable for both hypercube detection and path detection. 

5.2. Supercritical threshold 

Here we consider the choice of a supercritical threshold, where t is fixed such that po(0 > 
Pc- We already saw in Section 4.2 that the largest open cluster is unique and occupies 
a non-negligible fraction of the entire network. This is actually true both under the null 
and under an alternative. The ULS scan test based solely on the largest open cluster is 
comparable to the test based on the grand mean after thresholding. In turn, assuming t 
is fixed, this test is asymptotically powerful when m("~^/^^''6'm oo, and asymptotically 
powerless if a < 1/2 and 9m is bounded. (This is easily seen using Ghebyshev's inequality.) 
This is comparable to the LOG test at supercriticality. 

In general, the ULS scan statistic includes other (smaller) open clusters. The story of 
the second-largest cluster of supercritical percolation in a box is not yet complete, and 
for this reason the behavior of the ULS scan statistic remains incompletely understood. 
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The difficulty arises from the possibihty that the second-largest cluster in V,„ might lie 
at its boundary. Whether or not this occurs depends on the outcome of a calculation (yet 
to be done) of energy/entropy type involving so-called "droplets" near the boundary of 
Ym (see, e.g., Bodineau, loffe and Velenik [5]). To simplify the discussion, we finesse this 
problem by working where necessary on Ym with toroidal boundary conditions. That is, 
whenever we make statements concerning supercritical percolation on the graph Ym , we 
may add edges connecting sites on its boundary as follows: when d = 2, for = 1, 2, . . . , m, 
an additional edge is placed between site (l,fc) and site {m,k), and similarly between 
(fc, 1) and {k,m). 

In proving exact asymptotics for test statistics under the null, we assume toroidal 
boundary conditions. Our results on asymptotic power do not require such exact results 
but require only orders of magnitude, which do not need the toroidal assumption. We 
emphasize that similar results are expected to hold with "free" (i.e., without the extra 
edges) rather than toroidal boundary conditions. Once the percolation picture is better 
understood, such results will follow in the same manner as those presented in this paper. 
Our results for the torus are also valid if instead we discount open clusters that touch 
the boundary of Vm. Details of this are omitted, and the proofs are essentially the same. 

When working on the torus, the second-largest cluster is controlled through the fol- 
lowing calculation. Ccrf [8] proved that the limit 

Sp:=- lim fc-(''-^)/''logP(oo>S'>fc) = - lim logP(S' = fc), (8) 

exists, with < (5p < oo for all fixed p € {pc, 1). The dependency on d is left implicit. 

A result similar to Lemma 5 holds with dp playing the role of and the exponent of 
logm changed in places. It turns out that we need this result only when 6 — 0. For /3 > 
and a supercritical t, let 7o|t(/5) := 7(_F'o|t5 A'olt? 0, /3), defined in Lemma A. 9. 

Lemma 6. Assume that t is fixed such that pc < pa{t) < 1 and that km/ logm — > df3 and 
km ^^^"^ / logm — >■ d/S' for some </?,/?' < oo. Then, under the null, the following holds 
in probability on the torus Ym-' 

1. ///?'> 1/V(t), then Umit,km)^ 0(1). 
2- IfO<(3'< l/4o(t) "'^'^ /3 = oo, then 

(logm)-i/2[/„(t, km) ^ ao|t[2d(l - p'Sp„^t))f\ 

where a^^^ := Var(Fo|t). 
3. If P <oo, then the conclusions of Lemma 5 apply. (Note that Cpo(t) —^■) 

Based on Lemma 6, we obtain the following result on the performance of the ULS scan 
test at supercriticality. As before, we restrict ourselves to the case where Um{t, km) is of 
order (logm)^/^. We also chose to state a simple result instead of a more precise result 
with multiple subcases. This result holds irrespective of the type of boundary condition 
assumed on Ym- 
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Theorem 7. In hypercube detection, the test based on Um{t,km), with t fixed such that 



We also mention that the equivalent of Theorem 6 holds here as well. 

The improvement of the supercritical ULS scan test compared with the supercritical 
LOG test is a weaker requirement on 9m by a logarithmic factor. Thus, this test's per- 
formance is still much worse than that of the scan statistic when detecting hypercubes. 

5.3. Critical threshold 

If we choose a threshold as described in Section 4.3, and if (5) is true, then the power of 
the ULS scan statistic is greatly improved, as in the case of the LOG test. In fact, it can 
be proven that Theorem 4 remains valid with S(tm) replaced with Um{tm, km), as long as 
km = o{m)°"^ so that the largest open cluster under the alternative is scanned. This boils 
down to showing that under the null, the ULS scan statistic is at most a power of logm, 
which we do in Lemma 7 below. However, the ULS scan test does not seem to offer any 
substantial gain in power over the LOG test, given that 9m. is still required to be large 
enough to change the regime of the percolation process within an alternative K from 
subcritical to supercritical. That said, actually proving this would require information 
on the smaller open clusters near criticality, which is scarce and very difficult to obtain 
(see Borgs et al. [6] for some partial results and postulates). 

5.4. Across all thresholds 

Finally, we discuss the (simple) ULS scan test across all thresholds, as suggested in Patil 
and Taillie [41]. To take advantage of a phase transition near criticality, we assume, as 
in Section 4.3, that there exists tc such that po(^c) ~ Pc and that po{t) is a continuous 
function of t in a neighborhood of tc- We also assume that (5) holds. In Proposition 2, 
we showed that scanning small clusters may lead to a decrease in power. For this reason, 
and also to facilitate the analysis, we limit ourselves to clusters of size at least km', that 
is, we consider the test based on 



where, for definiteness, Um{t,km) is calculated on the torus Ym when t < tc- 
Let re(/3) = init jeitifi) / o^'^u, where, in congruence with Sections 5.1 and 5.2, 




ULSm(^m) = max 



(9) 




t>t, 
t<t, 
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with 7 being the function defined in Lemma A. 9. We first establish the behavior of 
ULSm(fcm) under the null. 

Lemma 7. Let km ~ /31ogm where /3 > 0, and let t^ he such that d/ 13 < Cpaitfi) < 
Define r](/3) :— sup{cro|t/o'o|s: s <t < t^}. With probability tending to 1, under Fq, 

limsup(logm)-i/2ULS,„(fc,„) < r/(/3)(dre(/3))i/^ 

m— >oo 

If in addition, either ao\t is non- decreasing in t or Fq has no atoms on (— cxj,i^], then, 
in probability under Fq, 

(logTO)-i/2 ULS™(fc™) ^ (dre(/3))i/^ 

In fact, a result as precise as Lemma 7 is superfluous, given the behavior of the ULS scan 
statistic under the alternative at supercriticality and near-criticality, which is polynomial 
in m. The next theorem does not require the use of toroidal boundary conditions. 

Theorem 8. In hypercube detection and assuming that (5) holds, the test based on 
ULSm(fc,„), with km = [pXogm] for some (3 > 0, is asymptotically powerful if 9mTn^ — >■ oo, 
for some < X < a/v satisfying X < (a — l/2)d if a > 1/2. 

Thus, scanning all thresholds elicits the best performance of the LOG tests. Neverthe- 
less, the overall test is still suboptimal when detecting hypcrcubcs compared with the 
scan statistic. Wc mention in passing that the same result holds for the simpler test that 
scans only the largest open cluster at each threshold. 

6. Implementation and numerical experiments 

The scan test has been shown to be near-optimal in a wide variety of settings, differing 
in terms of both network structure and cluster class (Arias-Castro, Candes and Du- 
rand [1]; Arias-Castro, Donoho and Huo [3]). It is computationally demanding, however. 
For the simple situation of detecting a hypercube, the scan statistic can be computed 
in O(A^logA^) flops, where TV := to'' is the network size if the size of the hypercube 
is known. If one scans over all possible hypercubes, then computing the scan statistic 
requires O(A^^logA^) flops. For nonparametric shapes, the computational cost is even 
higher; in fact, for the problem of detecting a loopless path, computing the scan statistic 
corresponds to the reward-budget problem of DasGupta et al. [13], shown there to be 
NP-hard. Because the scan statistic is so computationally burdensome, the cluster class 
is most often taken to be parametric in practice, even though the underlying clusters 
may take a much wider range of shapes. For instance, discs are the prevalent shape 
used in disease outbreak detection (KuUdorff and Nagarwalla [33]), with variants such 
as ellipses (Hobolth, Pedersen and Jensen [26]; KuUdorff et al. [32]). For a wide range 
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of parametric shapes, Arias-Castro, Donoho and Huo [3] recommended a multiscale ap- 
proximation to the scan statistic. Efforts to move beyond parametric models include 
tree-based approaches (KuUdorff, Fang and Walsh [31]), simulated annealing (Duczmal 
and Assungao [16]) and an exhaustive search among arbitrarily shaped clusters of small 
size (Tango and Takahashi [51]). 



The LOG test does not assume any parametric form for the anomalous cluster, and 
in that sense is nonparametric. Its computational complexity at a given threshold is of 
order the number of nodes plus the number of edges in the network (Gormen et al. [10]), 
and so of order 0{N) flops for the square lattice. 

The ULS scan statistic is nonparametric as well. Computing Um(t,km) requires deter- 
mining Qm , which takes 0{N) flops, and then scanning over Qm ■ Because the clusters 
in Qm do not intersect, scanning over them takes order 0{N) flops. Therefore, comput- 
ing ULSm can be done in 0(M • N) flops, where M is the number of distinct values at 
the nodes. Patil and Taillie [42] argued that this can be done faster by using the tree 
structure of Q*„ , where the root is the entire network Ym and a cluster K S ICm {tj ) is 
the parent of any cluster L G ICm{tj+i) such that L C K, where ti < ■ ■ ■ < t^j denote the 
distinct values at the nodes. 

We complement our theoretical analysis with some small-scale numerical experiments. 
Specifically, we explore the power properties of the LOG test of Section 4 and the ULS 
scan test of Section 5 in the context of detecting a hypercube in the two-dimensional 
square lattice. Patil, Modarres and Patankar [40] are developing sophisticated software 
implementing the ULS scan statistic for use in real-life situations, with more recent 
variations Patil, Joshi and Koli [38]. However, this software is not yet available, so we 
implemented our own (basic) routines. 

We used the statistical software R (R Gore Team [48]) with the package igraph 
(Gsardi [11]). Our (basic) implementation of the ULS scan statistic for a given threshold 
is much slower than both the scan statistic with a given mask and the LOG statistic, 
especially when there is no constraint on the size of the open clusters to be scanned, that 
is, when km = 1. In all of our experiments, we chose the square lattice in dimension d = 2 
with side length m = 500 for a total of 250,000 nodes, and we considered three alterna- 
tives: squares of side length ^ g {10, 50, 100}, corresponding roughly to a S {0.4, 0.7, 0.8}. 
The squares were fixed away from the boundary of the lattice, given that the methods 
are essentially location-independent. (This is rigorously true of the scan statistic.) We 
assessed the performance of a method in a given situation by estimating its risk, which 
we define as the sum of the probabilities of type I and type II errors optimized over all 
rejection regions. 

We first ran some experiments to quickly assess the power of the scan test. We found 
that the test agrees very well with the theory (i.e.. Lemma 3), which we already knew from 
previous experience. Specifically, we assumed a normal location model and simulated 100 
realizations of the null and each of the three alternatives with 9 G {j/i- j = 1,3,5,7,9} 
(see Figure 2). 

Next, we performed some larger experiments to assess the power of the LOG test. We 
simply assumed a site percolation model with probability p G {0.05, 0.10, ... , 0.90, 0.95}. 
Note that Pc is not known for site percolation in the square lattice, although pc ~ 0.593 
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Figure 2. The risk of the scan test against each of the three alternatives. The x-axis is 6, and 
the y-axis is the estimated risk based on 100 replicates. 

from extensive numerical experiments (Feng, Deng and Blote [19]). We simulated the null 
and each of the three alternatives with g <E {0.05, 0.10, 0.90, 0.95}, g > p, within the 
anomalous cluster. We replicated each situation 1000 times. The risk curves are shown 
in Figure 3. The test seems to behave similarly above and below criticality. At near- 
criticality, the test is rather erratic. However, when the size of the anomalous cluster 
is large enough, i — 100, the risk curve is steepest just under Pc, at p = 0.55 in our 
experiments, with full power against q > 0.65. Figure 4 shows boxplots of the test statistic 
for the case where £ = 100 and p = 0.40 (subcritical), p ~ 0.55 (near-critical), and p — 0.70 
(supercritical). 

If we were to use this test in the context of a normal location model, then the cor- 
respondence would he t = $~^(p) (the threshold) and 9 = t — ^~^[q), where i" denotes 
the normal survival distribution function. Figure 5 plots the risk curves in this con- 
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Figure 3. The risk of the LOG test against each of the three alternatives. The a::-axis is the 
percolation probability q on the anomalous cluster, and the y-axis is the estimated risk based 
on 1000 replicates. Each curve corresponds to a different percolation probability p. 
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Figure 4. The size of the largest open cluster in logjg scale (j/-axis) versus the percolation 
probability q, for the alternative £ = 100 and p G {0.40,0.55,0.70} (from left to right). Each 
boxplot represent 1000 replicates. 



text for pG {0.40,0.50,0.55,0.60,0.70}. In particular, the test at near-criticality with 
t = l>-i(0.55) = -0.126 has full power against the alternative with i = 100 and 6 = 0.26. 

Finally, we experimented with the ULS scan test. To limit the size of our simulations, 
we considered alternatives with 9 = $^^9) with q € {0.55,0.6,0.65,0.70,0.80,0.90} and 
chose t = with {0.40,0.50,0.55,0.60,0.70} as thresholds. We restricted scan- 

ning to open clusters of size not smaller than 1/10 of the size of largest open cluster, 
essentially falling in the regime of Part 2 of Lemma 5, and also making the computation 
much faster. We used 200 replicates. We again see that the risk curve is sharpest near 
criticality when the size of the anomalous cluster is sufhciently large, here for i> 50. 
Compared with the LOG test, the ULS scan test has more power at large 9 when the 
cluster is small £ = 10 (as predicted) and, more interestingly, slightly more power when 
the cluster is larger. Compared with the scan statistic, which knows the size and shape of 
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Figure 5. The risk of the LOG test in the context of a normal location model. The a::-axis is 
6, and the y-axis is the estimated risk based on 1000 replicates. Each curve corresponds to a 
different threshold t. The solid ( — ), dashed (--), dotted (••■)i dot-dashed (--) and long-dashed 
(--) curves correspond to p = 0.40, 0.50, 0.55, 0.60 and 0.70, respectively. 
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Figure 6. The risk of the ULS scan test against each of the three alternatives. On the x-axis is 
9, and on the y-axis is the estimated risk based on 200 replicates. Each curve corresponds to a 
different threshold t. The solid ( — ), dashed (--), dotted (■ ■ ■), dot-dashed (--) and long-dashed 
(--) curves correspond to p = 0.40,0.50,0.55,0.60 and 0.70, respectively. 

the anomalous cluster, the ULS scan test with the best choice of threshold (corresponding 
to p = 0.55) requires approximately threefold greater signal amplitude. 

7. Discussion 

The contribution of this paper is a rigorous mathematical analysis of the performance 
of the LOG test independent of, and more extensively than Davies, Langovoy and Wit- 
tich [14] and Langovoy and Wittich [34], and of the ULS scan test, both nonparametric 
and computationally tractable methods. We made abundant use of percolation theory to 
establish these results. We compared the power of these tests with that of the scan statis- 
tic, which is known to be near-optimal in a wide array of settings. Although these tests 
are comparable in power with the scan statistic for the detection of a path, they may be 
substantially less powerful for the detection of a hypercube. Note, however, that the scan 
statistic is provided with knowledge about the shape and size of the anomalous cluster. 
In theory, we argued that this was the case based on some heuristics and conjectures 
from percolation theory. Numerically, this appears to be the case when the anomalous 
cluster is large enough. In our experiments, the ULS scan test was slightly more powerful 
than the LOG test, and required a 9 three to four times larger than the scan statistic, 
which has the advantage of knowing the shape and size of the cluster. This result is 
promising, and further numerical experiments are needed to evaluate the power of these 
tests in truly nonparametric settings, because they do not require previous information 
about cluster shape, and are computationally more feasible in general. 

Our theoretical results generalize to other networks that resemble the lattice, with a 
different critical percolation probability and different functions Qp and 5p. In particular, 
we used the self-similarity property of the square lattice and the fact that it has poly- 
nomial growth. Our results also generalize to other cluster classes; in the setting of the 
square lattice, they extend immediately to any class of clusters that includes a hypercube 
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of comparable size (e.g., the class K-m of clusters K of size \K\ = [m"]'^), such that there is 
a hypercube Kq C K with jii'ol/l^'^l > <^m, where ujm more slowly than any negative 
power of TO. In addition, the class might contain clusters of different sizes, although in 
that case the worst-case risk would be driven by the smallest clusters. Implementation 
of the scan statistic may be much more demanding in this case. The main results of 
Section 4 require only that F0{t) be twice diffcrentiable in {t,6), with deFgit) < for 
all {t,d), which is the case, for example, for location models and scale models if Fq is 
twice diffcrentiable with a strictly positive first derivate. With some additional work, we 
also can obtain results for classes of "thin" clusters as defined in Arias-Castro, Candes 
and Durand [1] . The key is to understand the percolation behavior within and near such 
clusters. Some results are available for slabs (Grimmett [21], Theorem 7.2) and more 
general subgraphs of lattices including "wedges," and these appear to be transferable to 
other "curved" slabs. 

Appendix A: Proofs 

Notation. We write /,„ ~ g„i as n — )■ oo if fm/gm ^ 1- Similarly, we use O(-) and o(-) 
and write fm x g-m as n — > oo if fm ~ 0(gm) and vice versa. We also use their ran- 
dom counterparts, ^p, Xp, Op(-), and op(-). For example, Z„i = op{k„i) means that 
Zm/km — >■ in probability, and Z,„ = Op(fc„i) means that Zm/km is bounded in proba- 
bility, which is to say that P(|Zm| > kmlm) — > 1 as to — > oo for any 1^ satisfying Im — > oo. 
We use 1{A} to denote the indicator function of the set A. The maximum of k and £ is 
denoted by ky i. 

A.l. On the size of percolation clusters 

Here we state and prove some results on the sizes of percolation clusters in U'' . We start 
by proving some properties of C,p. Recall that S denotes the size of the open cluster at 
the origin. Besides the limit in (3), the following bound holds for p <Pc and all fc > 1: 

F,{S>k)<{l-prj^^^^, (A.l) 

by Grimmett [21], Equation (6.80), adapted to site percolation. 

Lemma A.l. The function defined in (3) is continuous and strictly decreasing over 
(0,Pc]7 o,nd satisfies limp_i.o Cp — oo and limp_j.p^ C^p = 0. 

Proof. Let < p < p' < 1 . By coupling Pp and Pp/ in the usual way, 

fp{S = k)>{p/p'f¥p,{S = k), 

so that C,p < (pi + log{p' /p). Applying Grimmett [21], Theorem 2.38, to the event {S >k}, 
we find that, as in the proof of Grimmett [21], Equation (6.16), Cp/^ogp< Cp'/^ogP'- 
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( 1 - wYf ? ]<Cp'Cp'< log(p7p)- (A.2) 



summary, 



log(l/p) 

Therefore, is continuous and strictly decreasing on (0,pc)- Moreover, by fixing p' S 
(0,pc) and letting p — > 0, we have 

log(l/p) ^ 

Cp > Cp' 1 — 7TT7T 
log(l/p') 

Finally, by Grimmett [21], Equations (6.83), (6.56), Cp -> = Cp„ as ptPc- □ 

Next, we prove (4). We do this by standard means, and the claim may be strengthened 
(see also Grimmett [22]; Hofstad and Redig [52]). 

Lemma A.2. Consider site percolation on Z'^ with parameter p < pc, and let Sm denote 
the size of the largest open cluster within Y„i- Then (4-) holds, namely 

— — > — , in probability. 

log TO Cp 

Proof. Fix < e < 1/2. Let S*" be the size of the open cluster at a node v e Z'^, which 
has the same distribution as S. We start with the upper bound. By the union bound, 

P(5™ > fc) < ^ P(5^ > fc) = \V,n\ ■ HS > k). (A.3) 

Thus, using (3), for fc,„(£) := (1 + e){d/Cp)\ogm and to large enough, 

P(5™ > fc™(£)) < TO''exp(-(l - e/2)Cpfc„(£)) < TO-^'^/^, 

and the term on the right-hand side converges to 0. 

For the lower bound, consider N= [TO''/(logTO)^''] nodes vi,...,vn £ Ym sepa- 
rated from each other and the boundary of Ym by at least ^(logw)^. Let fcm(e) := 
(1 — e)(d/Cp) logTO. For sufficiently large to, the events Ei := {[S"''] < /cm(e)} are inde- 
pendent. Therefore, using (3), for large to, 

nSm<Kn{e)) < {l-nS>k^{e))f 

< (l-exp(-(l + £/2)Cpfc„(e)))'^ (A.4) 
<exp(-TO^'*/V(logTO)2''), 

and the last term on the right-hand side tends to as to — > oo . □ 

The following result describes the behavior of size of the open cluster at the origin 
when p is small. It may be made more precise, but we do not pursue this here. 
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Lemma A. 3. There exists c > depending only on d such that, for p S (0, (2c)~^), 

P*" <PpiS>k)<l{cp)'' Vfc>l. 

Proof. An animal is a connected subgraph of Z'' containing the origin. The lower bound 
conies from considering the probabihty that any given animal of size k is open. For the 
upper bound, by the union bound, we have Pp(iS' = k) < \Ak\p'^, where Ak is the set of 
animals with k vertices. There is a constant c> such that |^fe| < c*^, so that 

{cpY 



.iS>k)<Y^cV = P^<licp)\ 
•frr' I — CP 2 



when < i . □ 

We next present a result on the number of open clusters of a given size that is valid 
for allpG (0,1). 

Lemma A. 4. Consider site percolation on Z'' with parameter p, and let Nm{k) denote 
the number of open clusters of size k within Wm- Then, for k>l, 

^ '-¥{S = fc) < E(7V™(fc)) < — P(oo >S>k), 

k k 

In addition, for k,£> 1, 

\Gov{N^{k),N^{e))\<3''+^(k + e)''E{N,n{kye)). 

Thus, for k>l, 

Var(Ar,„(fc)) < 6'i+^k''E{N„,{k)). 
Proof. Let 5^ be the size of the open cluster at v within the box V^- Then 

where X^{k) = k~^l{S^^ — k}. We immediately have 

E(7V™(/fc))< V rHoo> S'' >k)=^^P^¥(oo> S>k). 
^-^ k k 

For the lower bound, we count only nodes away from the boundary, obtaining 

IE(iVm(fc))>|V„(fc)|ip(5 = fc), 
k 

where V„i(fc) := {fc, . . . ,m — fc}'*. 
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We turn now to the covariances. By (A. 5), 

Cov(^„(fc),iV^(£))= CoviX^k),X"'il)) 

= Cov{X%k),X^{l)), 

\\v-w\\<k+e 

because X'"{k) and X'^{£) are independent if \\v — w\\ > k + £, where || • || denotes £°°- 
norm. Now, 

|Cov(X"(fc),X'"(^))| = |E(X"'(£)|X"(fc) = fc-i) -E(X'"(£))|E(X"(fc)) 

so that 

\CoviN,n{k),N,nm<]{2k + 2£+lfE{N„,{k)), 
and the second claim of the lemma follows. □ 

We now describe some properties of the open clusters within V„i in the supercritical 
regime. In this regime, it is known that, with probability 1, there is a unique infinite 
open cluster in Z'^, denoted by Qoo (see, e.g., Grimmett [21], Section 8.2). With high 
probability, the largest open cluster within is a subgraph of this infinite open cluster. 
Next, we present some additional information on its size, Sm. 



Lemma A. 5. Suppose that p > Pc- There is a constant C > such that, with probability 
at least 1 — exp(— Cm''""'^), there is a unique largest open cluster within Y„i, and it is a 
subgraph ofQao- Moreover, as m^oo, its size Sm satisfies 

™ ^ ^ A/"(n 1 ) in distribution, 
VVar(5™) 

with E(<S'm) ^ 6p|V„i| and Var(S'„i) ^ cr^|Vm| for some > depending on {d,p). 

Proof. For the first part and the limiting behavior of E(S'm) as m — > oo, see the discus- 
sion of Penrose and Pisztora [44], Theorems 4 and 6, and the beginning of this Appendix. 
For the weak limit and the limit size of the variance of Sm, see, for example, Penrose [43], 
Theorem 3.2. □ 

(2) 

We next describe some properties of the smaller open clusters. Let Sm be the size of 
the largest open cluster of that is contained entirely within V^. 
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Lemma A. 6. Suppose that p> pc- There exists a positive constant Sp such that 

in probability. 



c.(2) / j\ d/{d-i) 



(logTO)''/('^-i) \5p 

For any c> 0, there exists ai ~ (7i(p, c) > such that the following holds: With probability 
tending to 1, there exist at least (Tim'^exp[— (T2(logTO)'-''~"'^)/''] open clusters of size [clogm] 
of Z*^ lying within • 

Our results on exact asymptotics in the supercritical phase concern Ym with toroidal 
boundary conditions. One effect of removing the boundary from Ym is that the asymp- 
totics of the largest cluster coincide with those of S,m as well as for the second-largest 

(2) 

cluster Sin ■ In the proof of Theorem 7, we need an upper bound on the size of the 
second-largest cluster inside a box with "free" boundary conditions. We do not explore 
this in detail here, because it relies on extensions of arguments of Kesten and Zhang 
[28] (see also Grimmett [21], Proof of Theorem 8.65), which have not yet been not fully 
explored in the literature. Instead, we note that the the second-largest open cluster in 
a supercritical percolation model on Vm with free boundary conditions has size of order 
Op((logm)'*/('*-i)). 

Proof of Lemma A. 6. It was proven by Cerf [8] that the limit 

6p:=- lini k-^'^-^^/'^\og¥{S ^ k) (A.6) 

k^oo 

exists and is strictly positive and finite when pc < p < 1. It is elementary that Sp thus 
defined is equal to that of (8) (see also Grimmett [21], Section 8.6). The first part of the 
lemma follows by the same proof as used in Lemma A. 2. 

As in the proof of Lemma A. 4, the mean number of clusters of size k :— [clogm] 
satisfies 



exp(-(5i(clogm)('^-i)/'')<^„< — ^ r exp(-<52(clogm)('^-i)/'^) 



c log m [c log m 

for positive constants 6'\ The number of such clusters has variance no larger than Ck'^fim 
for some C < oo. The claim follows by Chebyshev's inequality. □ 



A. 2. Some distributional properties 

Here we present some results for AEP and exponential families of distributions. Our first 
result is on the size of the maximum of an i.i.d. sample from an AEP distribution. 

Lemma A.7. Let F e AEP {b,C) for some b > andC>0. Then, for Xi, X^'''^' F , 



7 — ' — — — > C , in probability. 

(logn)^'" 
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Proof. Fix £ e (0, 1) and define a;„(e) = ((1 — £)(logn)/C)^/''. For n large enough, we 
have, by independence, 

P(niax(Xi,...,X„)<a:„(e)) < (1 - F(x„(£)))" 

<(l-cxp(-(l + e)Cx„(£)^))" 

2 

< exp(— n*^ ) — > 0. 

Now redefine a;„(£) = ((1 + £)(logn)/C)^/''. For n large enough, we have, by the union 
bound, 

P(max(Xi, . . . ,X„) > a;„(£)) < nF{xn{e)) 

<ncxp(-(l-£/3)Ca;„(£)'') 



We next describe the behavior at infinity of the logarithmic moment-generating func- 
tion and rate function of an AEP distribution. 



Lemma A. 8. Let F e AEP(6, C) for some b>l and C > 0, with logarithmic moment- 
generating function A and rate function A*. Then, as 9 ^ oo, 

5»-f>/(fa-i)A(0)^C(fe-l)(Cfe)-''/(''-^\ h>l- (A.7) 
{\og{l/{C-e)))-^K{e)^l, 6 = 1; (A.8) 

and, as x — oo, 

x-^A*(x)->C. (A.9) 



Proof. Let ip be the moment-generating function of F. We focus on the upper bound 
in (A.7) - obtaining the bound in (A.8) is analogous - and deduce the lower bound in 
(A.9). Let 6 > 1, C/2 <A<C, and let xi > be such that F{x) < cxp(-At^) for aU 
x> Xi. We start from the following bound: 

/OO POO 
0ejip{9x)F{x)dx<exp{exi)+ / 0exp{0x - Ax^) dx. 

We again divide the integral into x < X2 and x > X2, where X2 '■= (26'/^)^/^''^^^ For 
a; < a;2, we bound exp{9x — Ax'') by its maximum over (0, oo). For x > X2, exp{9x — Ax'') < 
exp(-(C/4)a;''). Letting B ^ A{b - l){Ab)~''/'-''-'^\, and assuming that 9 is large enough 
such that X2> xi, we get 



pOO POO 

I 9cxjp{9x — Ax'')dx<{x2-xi)9cxp{B9''^'^''^'^^) + 9 / cxp(-(C/4)a;^) dx. 
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Thus, when — > oo, 

ifiO) = 0(6i''/(''^i))exp(B6'''/(^-i)). (A.IO) 
Taking logs and letting — > oo, we get 

limsupe'-''/(''-i'A(6l) < 

Then letting A tend to C, we obtain the upper bound in (A. 7). 

Now, for X exceeding the mean of F, A*{x) = supg>g(6'a; — A(6')), and starting from 
(A.IO), we obtain 

A* (a;) > sup(6'x - BO^/^^-^^) - log 2 = Ax^ - log 2. 

Therefore, 

lim x-^K*{x)>A. 

x-^oo 

Then, letting A tend to C, we obtain the lower bound in (A. 9). □ 

We now define 7, first appearing in Section 5.1. Our function 7 depends on certain 
quantities listed in the following lemma. It also depends on the quantity ^, which we take 
as that defined in (3). It is only through its dependence on C that 7 is affected by the 
geometry of 

Lemma A. 9. Consider a distribution F on the real line, possibly discrete but not a point 
mass, with finite mean fi and finite moment-generating function at some positive 9 > 0, 
and let A* denote its rate function. Let v < fi, and fix (3,(^ € [0, 00). 

1. Assume thatC^O. //O < /3 < 1/C, or (3^0 and F e AEP{b,C) for some b>2 and 
C>0, then there is a unique solution "f = ^(F,i>,(^, (3) to the following equation 

inf [sA*(i^+ VtA) + = 1- 

/3<s<l/C 

2. Assume that C = 0. The foregoing holds as long as v = fi (and with 1/C interpreted 
as 00 ). 

Proof. Let M = sup{.T: A*(x) < 00}. Because F is not a point mass, ^ < M < 00. Define 

G(s, 7) = sA*{v + yf^s) + sC 

Note that G{s,j) is finite (resp., infinite) if 7/s < (A/ — u)"^ (rcsp., 7/s > (Af — i^)'^)- In 
addition, G{s,^), and its derivatives are continuous wherever G is finite, and thus are 
uniformly continuous on any compact subset of [0, cxj)^ on which G is finite. Furthermore, 
0(5,7) is strictly increasing in 7 on the interval {0,s{M ~ i^)^)- Let 

L^(7)= inf G(s,7). (A.ll) 

0<s<l C 
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Thus Lp{'-f) is finite if 7^ < (M — J^)^, and infinite wlien < is replaced by >. Further- 
more, for 7 < {M — vY IQ, the infimum is achieved at some value s^ of s in a neighborhood 
where G(s,7) < 00. 

Assume first that /3 > 0. It may be seen that Lp{'j) is continuous and strictly increasing 
in 7 on the interval [0, (A/ - v)'^/C)- Let < 7 < 7' < (M - v)'^/C. Then 

< i^(7') - i/3(7) < G{s^,j') - G(s^,7), (A.12) 

and continuity follows from the properties of G noted earlier. Similarly, 

L^(7') - L^{j) > G{sy,j') - G(sy,7) (A.13) 

and strict monotonicity follows similarly. 

It suffices to prove that Lfj{'^) takes values <1 and finite values >1. The first claim 
follows from the fact that, with 7 — f3{ii — j/)^, 

iM7)<G(/3,7)=/3C<l- 

We now turn to the second claim, and make use of two general properties of rate 
functions that follow from Dembo and Zeitouni [15], Equation (2.2.10), Lemma 2.2.20. 
It is standard that A*(/i + x) ^ \{xIg)'^ as a; J, 0, where cr^ > is the variance of F . 
Therefore, 

3r € (0, M) such that A*(^ + x) > Hx/a)^ when < x < T. (A.14) 
With T thus chosen, by convexity, 

3A>0 such that A* {ji + x)> Ax when x > T. (A. 15) 

Assume first that C > f^nd M = 00. By (A. 15), for sufficiently large 7, 

oo>L^(7)> inf [sA{v^^l+^/^s) + sC\>A{|3{v^^l) + ^/^)>l. 

I3<s<l/C 

Suppose next that C > and A/ < 00. Let < 7 < {M — v)'^ /Q. Because t\*{v+ ^J^/ s) = 
00 if s<7/(A/-z.)2=:/3o(7), 

00 > Lfi{-i) > /3o inf A*{i^+ ^/^s) + /3oC 

/3o<s<l/C 

(A.16) 

= /3oA*(i.+ v^) + /?oC- 

The limit of this, as 7^ (A/ — vf' jC,-, is strictly greater than 1. 

Now let C = and v = [i, and note that ^^(7) < 00 for all 7 > 0. Suppose that Af < 00 
and 7 > 0. By dividing the infimum in (A. 11) according to whether or not \p^ijs < T, 
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we find that 

oo > Lr(j) > mini inf sA*(/i + W j/s), inf .sA*(/i + -v/t/s) > 

by (A.14)-(A.15). This diverges as 7 — 00. 

When (3 = 0, some of the arguments fail, because G(s,7) might not be continuous 
at (0, 0). Assume that F € AEP(6, C) for some b>2 and C > 0. Note that M = 00 by 
Lemma A. 8. If 6 = 2, G{s,j) — > C7 when 7 > is fixed and s ^ 0, by Lemma A. 8, and 
taking this limit as an extension at s = 0, the same arguments used in the case /3 > 
apply. If 6 > 2, we need slightly different arguments. As before, let be a minimizer 
of G{s,'-f). We have that Sy is well defined for all 7 and strictly positive, because G is 
uniformly continuous on any compact of (0, 1/C] x [0, 00) and G(s, 7) ~ C"/''^'^ s^~^^^ — )• 00 
when s — )- 0. Thus we may proceed as before in (A.12)-(A.13), obtaining that ^0(7) is 
strictly increasing and continuous. As before, we turn to proving that Lq takes values 
<1 and finite values >1. First, with 7 = (/i — vY /{2C) and s = 1/(2C), 

Loh) < G{s, 7) = 7C/(m - '^f = 1/2 < 1. 

Next, showing that Lq takes finite values above 1 is done exactly as before, except that 
(A. 14) is replaced by 

G{s, 7) ^ > C7C'/'-^'/^ 7 ^ 00 

by Lemma A. 8. □ 



The following result describes the variations of 7 (defined in Lemma A. 9) with the 
parameter of an exponential family. 

Lemma A. 10. Consider a natural exponential family of distributions (Fg, 9 >0) and let 
and Ag denote the mean and the rate function of Fg , respectively. Let C,g be a continu- 
ous and decreasing function of 9. Then, for any fixed < (3 < 1 / C,q , jg := j{Fg, fioXe, 
is continuous and strictly increasing in 9. Moreover, if Co ^ ^ when 9 — >■ 6c, then jg 00 
when 9 ^ 9c. 

Proof. First, note that jig > yUo (Brown [7], Cor. 2.22) so that jg is well-defined. That 
79 is strictly increasing comes from the fact that both (g and Ag(a) (a > fig fixed) are 
decreasing. The latter can be seen from 

A*g{a) = - lim 7 logPe(Xfc > a), 

where Xk is the average of the sample of size k from Fg Brown [7], Cor. 2.22, and the fact 
that the distribution of Xk as 9 varies forms a natural exponential family with parameter 
k9. That jg is continuous comes from the continuity of C.g and Ag(a) (in {9, a)). 
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For the behavior near 0c, note that Ag(a) = for a < ne, so that G{\/ [2C,0),"f) ~ 1/2 
for any 7 < [jjLg — fiof' / {2C,g) . Combine this with the fact that ^xg is strictly increasing in 
9 to see that 79 is of order at least 1/Ce- In fact, it is easy to see that 79 ~ (/ig — HoY /Qe 
when 6* /"^c. □ 



A. 3. Main proofs 

A. 3.1. Proof of Theorem 1 

By monotonicity, it is sufficient to assume that 9m = 9 for all m. Fix t and, for short, let 
p = Po{t) and p' ^pe{t). First, assume that 9 > 9^,, so that (pi < aCp- Fix B such that 
l/(p< B < a/(p' and consider the test with rejection region {Sm{t) > d-Blogm}. Under 
W^, we have Sm{t) = (1 + op(l))(d/Cp) logm by (4), so that P(S'™(i) > dBlogm) 0. 
Under M^p^j^, S^it) > = {1 + Opil)) {ad/ Cp') log m, so that P(S'„(i) > dBlogm) 

1. Thus this test is asymptotically powerful. 

Now assume that < 0*, so that Cp' > ctCp and there is B such that a/Cp' < B < I /(p. 
Let K'' = Ym\K.ltia sufficient to show that under both Hg" and H™^, Smit) = Sx-lt) 
with probability tending to 1, so that the values at the nodes in K have no influence 
on Sm{t). Indeed, let J be a hypercube within V,„ of sidelength [m/3] which does not 
intersect K. Then SK^it) > Sj{t), and the distribution of Sj{t) is the same under both 
H[)" and H™^. In addition, P{Sj{t) > dBlogm) ^ 1 by (4). Now, let L be the set of nodes 
within (supnorm) distance (logm)^ from K, so that L is a hypercube of side length 
[m"] + [2(logTO)^] containing K in its interior. In the event that {Sm{t) < (logm)^}, 
Sm{t) 7^ SK^it) only when 6*1, (t) > Sx-'it). The distribution of 6*1, (t) under the null is 
stochastically bounded by its distribution under H^^, which is itself bounded by its 
distribution under H'"^. Even under the latter, F{SL{t) > dBlogm) — >■ by (4). We then 
conclude the proof using the fact that F{Sm{t) < (logm)^) — )- 1, again by (4). 

A. 3. 2. Proof of Theorem 2 

Here we use the notation and follow the arguments of Section A. 3.1. In addition, let 
Cp, =log(l/p'), that is, the function ( in dimension one. When 9 > 9^ , we consider 
1/Cp <B< a/dCp'- Under M^, we stiU have Sm{t) ^ (1 +op(l))(d/Cp) logm. Under H™^, 
Sm{t) > Sxit) = (1 + op(l))(a/Cp/) logm, because K is isomorphic to a subinterval of 
the one-dimensional lattice. We conclude as before that the test with rejection region 
{Sm{t) > dBlogm} is asymptotically powerful. 

When 9 <9~ ,we consider a/d(p> < B <l/(p. As before, let L be the set of nodes within 
(supnorm) distance (logm)^ from K, so that L is now a band. As before, it suffices to 
prove that P(S'i(t) > dBlogm) — )• under H™^. Although (4) cannot be applied, because 
L is not isomorphic to a square lattice, its proof via the union bound and (3) applies. 
Indeed, fix 77 > small enough that (1 — r])(pidB > a. Then, for m large enough, we have 

F{SLit)> dBlogm) < \L\-F{S> dBlogm) 

< 0(m"(logm)2(''~i))exp(-(l -r/)Cp'(iBlogm) 

= 0(logm)2(''"^' exp((a - (1 - ri)Cp'dB) logm) 0. 
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A. 3. 3. Proof of Proposition 1 

Let fcm(e) = (1 — e)dlog(m)/log(l/po(im)) with s>0 fixed. We first show that S,n{tm) > 
fcm(e) with probability tending to 1 under H™. We use the notation and arguments 
provided in the proof of Lemma A. 2. As in (A. 4), 

P(5„(t„,) < fc„,(e)) < (1 - F{S > Kn{e))f 

< exp(-m=''/(logm)2'^) -> 0, 

where the second inequahty holds for m large enough by Lemma A. 3. 

Assume that 6'^ < < oo for all m. Proceeding as in Section A. 3.1 and using the slightly 
larger region L, it is sufficient to show that for e small enough, SL{tm) < k,-a[£) when 
~ Fg for all v £ L. Using the union bound and the fact that \L\ = 0{m)°"^, we have 

nSL{t„,) > k„,{e)) < \L\ ■ F{S>k,n{e)) < 0(m)"'*(cpe(t™))''"("\ (A.17) 

where the last inequality is due to Lemma A. 3 (and c is the constant that appears there). 
Through integration by parts, for > and e € (0, 1) fixed, we have pg{t) < po{{l — e)t) 
for sufficiently large t. Indeed, for t large enough, 

/>oo 

pg{t) = exp{et - A{e))po{t) + / cxp{0x - A{0))po{x) dx 



6'exp(6ia; - A{0) - C(l - e/3)''x'') dx 



< exp{et - 

<exp(-C(l-e/2)V) 
<Po{{l~e)t), 



where we used the fact that 6 > 1 in line 3 and the fact that \ogpo{t) ^ —Ct^ as 
i — > oo (because Fq € AEP(6, C)) in lines 2 and 4. The last property also implies that 
Pn{{l-e)t)<po{t)^'^'^'^''^^ for large t. Thus, for m large enough, pe{t„T,) < poitm)^^'"^ ^\ 
so that taking logs in (A.17), we get 



logP(5L(i™) > krn{e)) < 0(1) + (dlogm)(a + O(logpo(t,„))-^ - (1 - e)'+') ^ -oo, 

when e < 1 — a^/'-''^^^. (Remember that a <1 and that po(^m) 0, so the middle term 
is small.) 

A. 3. 4. Proof of Theorem 3 

Let Eg denote the expectation of A„ under Fg. By Lemma A. 5, under the null, 

s,At)-Ms..it)) ....... (^^^3^ 

^Varo(5,„(t)) 

with Varo(S'm(i)) of order m"^. Write p ■.= po{t) and p' :=pe,„ {t). 
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We consider the alternative with anomalous cluster if as a two-stage percolation pro- 
cess, where the first stage is percolation on Vm with probability p, as under the null, and 
the second stage is percolation on the closed nodes within if, that is. K\{v: Xy > t}, 
with (conditional) probability {p' — p)/{l — p). An open cluster at the first stage is called 
small if it is not a largest open cluster. 

We may assume, except where noted below, that 6,n 0. Because 

-^\ogpg{t) = Eg{Xy\Xy>t) - Eg{Xy), 

which is positive at 61 = by choice of t, there exists c € (0, oo) such that 

p' — p^c0m asm->oo. (A. 19) 

Let Aj„ > be the difference between the sizes of the largest clusters under the null 
and the alternative. For x £ K, let be the sum of the sizes of all small clusters of the 
entire lattice that contain some neighbor of x. Note that < J^xeoi^ + -^a;), where 
D is the set of x € if that are closed at the first stage and open at the second stage. 
Therefore, has expectation bounded above by 

E(A„0 < (^Y~f ) + ^^^^p)' (^-^^^ 

where //p < cx) is the mean size of a finite open cluster in the infinite lattice. 

By (A. 19) and the foregoing, E(Am) < COmm""^ for some C < oo. By Markov's in- 
equality, A„ = Op (6'™m"''). 

Thus, if et^m^^-i/^jd q, then A^/^Varo(S'™(i)) -> 0, implying that the same cen- 
tral limit law as (A. 18) holds under the alternative, so that the test based on the largest 
open cluster is asymptotically powerless. We also must consider the case where 9m -h 0, 
for which a similar argument is valid. 

Now assume that a > 1/2 and 6'™™^""^/^)'* ^ oo. By Grimmett [21], Theorem 8.99, 
and standard properties of the largest cluster in a box (to be found in, e.g.. Falconer 
and Grimmett [18]), with probability tending to 1, the largest open cluster increases in 
size by at least Ci(p' — p)|if | for some C\ = Ci{p) > 0. By (A. 19), this has order 9min°"^. 
Because 



VVaro(5™(0) 



n a ^(a-i/2)d 



■ oo 



for some C2 = C2(p) > 0, the test based on the largest open cluster is asymptotically 
powerful. 

A. 3. 5. Proof of Theorem 4 

We may assume without loss of generality that 9m ^ as m — )■ 00. By (5) and the 
assumption on tm, we have that Sm{tm) logm under the null. Now pe{t) is infinitely 
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differentiable in 9, with each derivative continuous in t and with 



dpe{t) 



80 



^Poit)[EoiX,\X,>t)-En{X,)] > '-^[Eo{X,\X, > t,) -Eo{X,)] > 0, 

8—0 Z 



uniformly for i in a neighborhood of tc- Therefore, there exists C > such that 



dpejt) 
89 



>1/C and 



89^ 



<C 



for {9,t) in some neighborhood of {0,tc). Thus, 

Peit) - Pa{t) > 9/C - C^9''/2 > 9/{2C), 

on such a neighborhood. Let A and B be such that Pc — Poitm) < Am^"'^" and 6*^ > 
Bm-"^"', and assume that B > 2AC, based on the statement of the theorem. Because 
9rn and t„t^tc, 



2C 



+ (Poitrn) -Pc) 







> 









^a(l/."-l/.')^00 



for ly" < v' and sufficiently large m. By (5) applied to X e /C,„, it follows that 
SK{tm) i<p rn°"^ under the alternative. Consequently, the test with rejection region 
{Sm{tm) > (log to) ^} is asymptotically powerful. 



A. 3. 6. Proof of Lemma 5 

Part 1. This follows immediately from Lemma A. 2. 

Therefore, we focus on the remaining two parts. We use the abbreviated notation F := 
Fg\t, A* := A*|j, 1^1 /ie|t, C Cpe(t), 7 :=7e|t(/3), Um := U„i{t,k„i), and write i' := ^o|t- 
Let Yk = ATfc — I/. As in Lemma A. 4, let Nr,i{k) denote the number of open cluster of size 
k within V,„, and define 

Gk{x)=P{k'^^Yk<x), 

where Yk = Xk — v and Xk is the average of an i.i.d. sample of size k from F. By the 
independence of Yk and Yl for K,L€ Qm distinct, we have 

nU^<x)^E( Y[ Gkixf-^^^A =E{cxp[-R,n{x)]), 

^k>k,„ ' 

where 

i?,„(a:):=- ^ 7V,„(fc) log(l - Gfc(x)). 

fc>fcm 

Thus, we turn to bounding Rmix). 
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Part 2. Define Xm = V log m and fix e > 0. For the lower bound, let im be the closest 
integer to ad log m between km and {d/ Cjlogm, where 

a = argmin[sA*(i/ + VtTs) + sC]- (A. 21) 

/3<s<l/C 

We have 

Rm{{l - £)Xrn) > T,n := VVm(OGf„((l " £)Xm), 

and we show that for e fixed, T,„ — > oo in probability. Fix 77 > 0. On the one hand, we 
use Lemma A. 4 and (3), to get 

(m — OP 

E(iV„(4„)) > ^ j^^nS = O > m'^cxp(-(l + 77)C^™) 

for m large enough. On the other hand, we use Cramer's theorem (Dembo and 
Zeitouni [15], Theorem 2.2.3) to get 

G,„((l-£)a-™) >P(Ff,„ >(l-e/2)V^) 

> exp(-(l + ?7)^„,A*[i/+ (1 - e/2) VtT^]) 

for m large enough. By the definition of 7, aA*[v + \/^/a\ + = 1, and thus for e small 
enough, 

aC + ak*[v + (1 - e/2) VoTa] < 1> 
by strict monotonicity, as in the proof of Lemma A. 9. Thus, for 77 small enough, 

+ C A* e/2) VtT^] < (1 - ri)d\og m. 

It follows that 

E(T„0 > to"'''. 

To bound the corresponding variance, we use Lemma A. 4 to obtain 

Var(T,„)<0(log7n)''E(T,„), 

and it follows by Chebyshev's inequality that indeed Tm — > 00 in probability. 
Because > 0, exp(— T^) — >■ in L^, and thus 

P(C/™< (l-e)x™)^0. 

We next show that E(i?m((l + £)xm)) 0, which will imply the claim of Part 2. Fix 
?7 > 0. We have that 

R,n{{l + £)xm) < Tm + 2Z™, (A.22) 
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where 

:=2 ^ Njrtik)Gk{{l + e)xm.) 

and Zm is the number of clusters of size exceeding fc™'* := [(1 + r/)((i/C) logm]. We first 
note that, as in the proof of Lemma A. 4, for large m, 

nZr,^) < m'^cM-^Ckif) ^ 0. (A.23) 

We next turn to T^, and show that for s fixed and rj small enough, E(T„i) — > 0. On 
the one hand, we use Lemma A. 4 and (3) to get 

E(7V™(fc)) < m''exp(-(l - rj)Ck) 

for m large enough. On the other hand, by ChernofF's bound, 

Gfc((l + e)x,n) < exp(-fcA*[i^ + (1 + e)x™/\/fc]). 

Taken together, we obtain 

E(r™)<2 J2 ™''exp(-(l-r/)[fcC + fcA*(i^ + (l + £)x„/v^)]) 

< O(logm) exp( dlogm — (1 — 77) min [k( + kA* {v + (l + e)xm/vk)]] 

< O(logm) exp((l — (1 — ?])A)dlogm), 

where 

y4:= inf [aA*(i/+(l + £)v/77a) + aC]- (A.24) 

P<a<(l+'n)/C 

As in the proof of Lemma A. 9, A = A{e,r]) is continuous in (e,r?) and strictly increasing 
in e. Because A(0,0) = 1 by definition of 7, for e fixed, —h := 1 — (1 — ri)A{e,r]) < for 
rj small enough, in which case E{Tm) < m"'"^/^ — > as m increases. 

By (A.22)-(A.23), we have that E(i?,„((l + e)a;„i))) ^ 0. By Jensen's inequality, 

nu,n < (1 + e)a^,«) > cxp(-E(i?„,((l + e)x,n))) ^ 1, 

and the proof of this part is complete. 

Part 3. We build on the arguments provided so far, which apply essentially unchanged, 
except in two places. In the lower bound, instead of Cramer's theorem, we use 

Gk{x)>F{x/Vk)\ 

combined with the asymptotic behavior for F. In the upper bound, A defined in (A. 24) 
is evaluated differently when b <2. 
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Part 3(a). When 6 > 2, we have a > in (A.21) (with /3 = 0), because 

h{s) := sK*{v + VtTs) + < s^^^'^ ^ oo 

for 7 fixed and s — > 0, by Lemma A. 8. Wlien 6 = 2, we take a small enough if the minimum 
is at a = 0. Then the other arguments in Part 2 apply unchanged. 

Part 3(b). By the same calculations, a = in (A.21), because h{s) > for all s > 0, 
and h{s) x s^"**/^ — >• when s — > 0, because b <2. This would make ^ = in (A.24) 
for any e > 0, making the arguments for the upper bound collapse. Instead, redefine 
Xm = (Co?logm)^/''fcm^ Because Xm/Vk^ oo uniformly over k < kl^\ for 77 > 
fixed, we have 

kC + kA*{u + (1 + e)x,n/Vk) > kQ + (1 - ri)Ck^-^'^{l + efx^^ 

for m large enough, by Lemma A. 8. Then the term on the right-hand side takes its 

(77) 

minimum over km < k < km at fc = km, and from here, the remaining arguments apply. 
A. 3. 7. Proof of Proposition 2 

Assume, for simplicity, that 9m = & < for all m- The key point is that Fg\t G AEP(6, C). 
Indeed, we have Fg\f.{x) = Fg{x)/Fg(t), where the denominator is constant in x and, 
integrating by parts, 

/>oo 

Fg{x)=cxp{ex-A{e))Fo{x)+ / ecxp{ey-A{e))Fo{y)dy. 

J X 

From here, we reason as in the proof of Proposition 1, using the fact that logFo(2/) ^ ~Gy^ 
when y — > 00, with 6 > 1. Thus Fg\i and F^^i have same (first-order) asymptotics, and 
so nothing distinguishes the asymptotic behavior of Um under the null and under an 
alternative. In detail, we proceed as in Section A.3.1, with the enlarged hypercube L, 
and show that in probability under IHI™^,, 

limsupfc^/^-i/2(logm)-i/''C/L < (rf/C)l/^ 

where J/^ is the ULS scan statistic restricted to open clusters within L. Because L is a 
scaled version of Vm, Fg\i £ AEP(6, C) and pg{t) < pc, Lemma 5 applies to yield 

fcV''-l/2(«logrr^)-l/^[/,.^(d/C)l/^ 

We then conclude with the fact that a < 1. 

A. 3. 8. Proof of Theorem 5 and Theorem 6 

The proof of Theorem 5 is parallel to that of Theorem 1 in Section A.3.1, but using 
Lemma 5 in place of Lemma A. 2. Note that wc use the fact that for t and /3 > fixed, 
7g|j(/3) is continuous and strictly increasing in 9. This comes from Lemma A. 10 and the 
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fact that when t is fixed, Fg^^. is also a natural exponential family with parameter 9. 
Similarly, the proof of Theorem 6 is parallel to that of Theorem 2 in Section A. 3. 2. 
Further details are omitted. 

A. 3. 9. Proof of Lemma 6 

The proof is parallel to that of Lemma 5. In particular, we use the notation introduced 
there and only note where the arguments differ (although never substantially). 

Part 1. In this case, by Lemma A. 5 and Lemma A. 6, there is only one open cluster 
with size km or larger, and the result follows from, for example, Chcbyshcv's inequality. 

Part 2. Define x,„ = ^J2a'^d{l — 6/3') logm and fix e > 0. For the lower bound, we have 

Fix 77 > 0. By Lemma A. 4 (still valid) and (8), 

]E(iV™(fc™)) > m''cxp(-(l + 77)^4^-1)/'^) 

for m large enough. By Cramer's theorem and the fact that A*(a;) ^ x^/ (2(t^) when x is 
small, 

G'fe„((l - e)x„i) > exp(-(l +7])fc„iA*[(l - e)xm/\/K'i]) 
>exp(-(l+,7)(l-e/2)4y(2a2)) 

for m large enough. Thus, 

E(r,„) > cxp(dlog77i - {l+i]){Sklt^^/'' + (1 ~ e/2)xl/{2a^))) > to^''(i-*'3')/4 

for 777 large enough and 77 small enough. For the variance, we use Lemma A. 4 to get 

Var(r™) < 0(logr77)'^'/('^-i)E(T„,). 

We then conclude by Chcbyshcv's inequality. 

We now show that -Rm((l + £)xm) in probability. Equation (A. 22) holds with 
kif := [(1 + 7;)(d/(5)log77i]'*/('^-i). As before, 

E(Z,„) < 77z'*exp{-i(5(fc(^))^''"^^/''} ^0 as 77i-> 00. 
By Lemma A. 4 and (8), 

IE(iVm(fc)) < 777'*exp(-(l - 7])Sk^'^-^'>/'^) 

for 777 large enough. The absence of a boundary to V„i is being used here. The tail 
behavior of percolation clusters near the boundary of a box is not yet fully understood 
(see the remark in Section 5.2). By ChcrnofF's bound and the behavior of A* near the 
origin, 

Gfc((l + e)xm) < exp(-(l + e)xlJ{2a^)) 
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for any k > km- Thus, 



E(r„0<2 ™''exp(-(l-77)5fc(''-i)/''-(l+e)x2^/(2a2)) 
fc=fe„, 

<0(logm)^/('^-i)m-^'^(i-^'3')/4 

for m large enough and rj small enough. 

Part 3. This part is even more similar to what we did in the proof of Lemma 5. 
The behavior of t/,„ is driven by the open clusters of size of order logm, with the only 
difference being that the term in fc^'^"^^/'^ from the bounds on Nmik) is negligible. Details 
are omitted. 

A.3.10. Proof of Theorem 7 

Without loss of generality, we assume that 9m is bounded. By Lemma 6 and our assump- 
tions on km, under the null, Um '■— Um{t, km) "^p A{\ogmy/^ for a finite constant A> 0. 
We now consider the alternative, where the anomalous cluster is K . 
The contribution of the largest open cluster, Qm, is 

ATTltY ^ \QmnK\ ^- ^ , |Q„,nA-^| ^- 

\/\Wm\ \/\Qm\ 

On the right-hand side, the first term is of order op(l), and the second term is of 
order Op(l), by Chebyshev's inequality and the fact that, with probability tending 
to 1, \Qm n if I X \K\ and \Qm\ ^ |Vm|, by Lemma A. 5. The last term is of (exact) 
order 0{9mm^°'~^^^^'^) , by the fact that fxg^t is diffcrentiable at 6* = with deriva- 
tive equal to (Tp|j > 0. Therefore, the ULS scan test is asymptotically powerful when 

liminf 6'„jTO'^"~^/^)'*(logTO)~^/^ is large enough. (Note that this requires a > 1/2.) If 
instead, we have limsup6',„m'^"~^/^^'*(logm)~^/^ — >■ 0, then the scan over Qm rnay be 
ignored, and we need to consider smaller clusters. 

By Lemma A. 6 and the upper bound on km, the second-largest cluster entirely within 
K is scanned and its contribution is of order 0(0™ (log m)''/*^^''"^-'), by the same argu- 
ments that established the contribution of the largest open cluster. Thus, the ULS scan 
test is asymptotically powerful when liminf 6'm(logm)''/^^''^^^~^/^ is large enough. If in- 
stead, 6',„(logm)'^/'^^'^~^'^^/^ — > 0, the test is asymptotically powerless. Indeed, let L be 
the set of nodes within distance (logm)^ from K, and let Ul be the result of scanning 
the open clusters of size at least km and entirely within L. As argued in the proof of 
Proposition 2, this time using Lemma A. 6, it is sufficient to show that Ul < A(logm)^/^ 
with probability tending to 1 under H™^. For any open cluster Q entirely within L, 



IQli^Q - Mo|t) = VlQli^Q - A*e„|t) + V IQI(Me„.|t - ^J■o\t), 
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so that 

Ul <maxy|Q|(^Q-Aie™|t)+op(l), 

where the maximum is over open clusters of size at least km and entirely within L, and 
the second term is op(l) by Lemma A. 6 and the size of 9^- Although varies, this 

maximum may be handled exactly as in Lemma 6, so that it is ~p A(alogm)^/^. and 
we conclude. 

A.S.n. Proof of Lemma 7 

We prove only the more refined part. We use abbreviated notation as before, in particular, 
we omit the subscript 0, using Ft = Fq^^, at = (Ja\ti and so on. The lower bound is obtained 
via ULS,„ > Um{t*)/a-t' , where t* defines r(/3), and applying Lemmas 5 or 6 to Um{t*) 
depending on whether t* > tc or t* <tc. For simplicity, we assume that t* y^tc- If t* =tc, 
then we consider a nearby threshold and argue by continuity. For the upper bound, we 
prove that P(ULSm > Xm) — > 0, where Xm ■= \J g log to and g> G := {dT{/3))^^^ . 

As t increases, clusters are created and then destroyed in the coupled percolation 
processes. Suppose the removal at time t from the percolation process of vertex v creates 
some cluster Qt{w) at some neighbor w of u. If ULS™ > Xm, there must exist a vertex 
V and a neighbor w such that the cluster formed at w at time X^, contributes at some 
future time t' > A"„ an amount at least .t,„ to ULS„i. By conditioning on v, Xy, and w, 
one obtains that 

P(ULS™ > x,n) < 0(1) + / ' ]P( U U ^*(^)) 'i^W' (^-25) 

where the o(l) term covers the probability that the cluster at time — oo, namely V,„, 
determines ULS„i, or that a cluster at threshold t>tp is of size at least fc,„ := /3 log to; 
dv is the neighbor set of w; and Vlt{w) is the event that: 

1. k := \Qt{w)\ satisfies k > f3\ogm, 

2. there exists a time t' >t such that Qt{w) still exists at time t' and 

3. Yt{k) — E(y('(fc)) > Xmat'Vk, where Yt{k) is the sum of a fc-sample from Ft. 

Assume (briefly) that at is non-decreasing, and note that ^t is automatically non- 
decreasing. Then as in the proofs of Lemmas 5 and 6, and using similar notation, 

^ Y,nMw))<Yl 51 'P(^'-=l^*HI>/31ogTO,rt(fc)-E(rt(fc))>a;atVI) 
<2dE{Rtixm)), Rtix):^ J2 Ntik)Gtik,x), 

k>k„, 

where Nt{k) is the number of i-open clusters of size k and 



Gtik,x) = nVtik) - E{Yt{k)) > xatVk). 
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Therefore, by (A.25), 

P(ULS™ > xm) < 0(1) + 2d(r + r E(i?t(.T™)) dF{t) 

\j —oo J tc-\-h 

(A.26) 

+ F(t, + h)-F{t,-h) 

for any h> 0. We bound E{Rt{xm)) as we did in the proofs of Lemmas 5 and 6. Exphcitly, 
when tc + h<t <ti3, we use Lemma A. 4 and (A.l), to get 

EiNt{k))<{l-p{t)f 



(l_e-Cp(t))2 



<C(/i,/3)fcexp(-fcCp(t.+ft)), C{h,l3) 



(1 — e~Cp{ta+h)'j2 ■ 
We use Chernoff's Bound on Gt{k,x), to obtain 

E(i?t(x™)) < C{h,(3){kt^tfexp{{l - At)d\ogm) + exp{-hd\og{m)/2), 
where fc^_4 := (1 + /7.)(d/Cp(t)) logm, 

At:= inf ,^ [sA*(pi + VffA) + <p(t)]> 

/3<s<(l+/i)/Cp(t) 

as in (A. 24), and the last term is the probabiUty that a there is a f-open of size exceeding 
fc^ (. Note that At > 1 for all tc + h<t<ti3 because g > G. By continuity of At, A+ := 
infjylt: tc + h <t < tj^} > 0. Hence, we have the following bound for all tc + h < t <ti3, 

IE(i?t(x™)) < C(/i,/3)[(l + /i)(d/Cp(t,+^))logm]'m-(^+-i)'^ + cxp(-/idlog(m)/2). 

When t <tc ~ h, we simply use the fact that 

^E(A^t(fc))<|V,„|-TO^ 

k 

and bound Gt{k,x) in the same way. We get 

E(i?t(x™)) < exp((l - At)dlogm), 

where 

inf sA:(^+v^). 

Again, At>l for t <tc — h and At — ^-oo > 1 as t — > — oo. Hence, by continuity of At, 
A- := inf{At -.t <tc- h} > 0, so that 

E(i?t(x™))<m-(^-i)^ 
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valid for all t <tc — h. Hence, the two integrals in (A. 26) tend to zero with m. We then 
let h^O so that F{tc + h) — F{tc — /i) — ?> 0, because F is continuous at tc- 

Assume now that F has no atoms on (— oo,t^]. Then at is continuous on (— oo,f^], and 
in fact, is uniformly continuous because Cf — > cr when t — > — oo, because it is positive on 
that interval (because Ut = implies that Ft is a point mass), a := min{crt: t < tp} > 0. 
Because g > G we can find c > such that g' := g(l — c)^ > G, and also ?7 > such that 

\crs-crt\<ca, ii\s~t\<7j,s,t<tp. (A.27) 



Let x'„^ ~ y/g'logm. We say that a cluster Q scores at time s if it exists at time s and in 
addition 

|g|>/31ogm, ^X,>|Q| /is "f" XrjfnCT s 'Y I I ■ 

Without loss of generality, assume that tc is not an integer multiple of 77. Fix two 
neighbors v^w € and a time t <tp. If f2t(w) occurs then either: 

(a) Qt{w) scores at some time s g [t,ntrj\, where nt €1^ satisfies {ut — l)r] <t < ntrj, 
or 

(b) there exists n>nt and s G [nrj, (n + 1)77) such that Qnriiw) scores at time s. 

The latter possibility arises when Qt{w) scores at some time s not belonging to the 
interval [t, 71477). Writing [7777, (77 + 1)77) for the interval containing s, Qt{w) must exist at 
the start of this interval, which is to say that Qt(w) = Qnrjiw)- 
The probability of (a) is no larger than 

Pik:^\Qtiw)\>l3logm,3se[t,ntT]]: Ytik)/k> fi, + x^ajVk). (A.28) 
By (A.27) and the fact that is non-decreasing, 

^^ + ^>^,^ + ^^ (A.29) 

so that (A.28) is no greater than 

P(fc \Qt{w)\ > l3\ogm,Ytik)/k > ^it + x'„at/Vk). (A.30) 
Arguing similarly, part (b) has probability no greater than 

nk■■^\Qt{w)\>(3\ogm,Y^,,{k)/k>^ln,^+x[^(Tnn/^)■ (A.31) 

t/r)<n<tfi /r) 

We divide the integral in (A. 25) as follows 



1/h Jtc-h Jta+h 
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The first integral is bounded by F{—l/h) and the third integral by F{tc + h) — F[tc — h) , 
both terms vanishing as /i — > 0. For the second and fourth integrals, we do exactly as 
before, separately for (A. 30) and (A. 31) - for the latter, the sum has at most (f^g + 
l/h)/r] + 1 terms in the second integral and at most (tp ~ tc — h)/r] + 1 terms in the 
fourth integral. 

A. 3. 12. Proof of Theorem 8 

By Lemma 7, ULSm(fcm) is of order at most -^/logm under the null. Now consider the 
alternative with anomalous cluster _ftr. If < (a — l/2)d < a/v, consider the contribution 
of the largest open cluster at supercritical threshold t and reason as in the proof of 
Theorem 7. Otherwise, consider the contribution of the largest open cluster at a threshold 
tm such that pc — Po{tm) ^ m~^^" . As in Theorem 4. the largest open cluster will be 
comparable in size to, and occupy a substantial portion of K. Reasoning again as in the 
proof of Theorem 7, the contribution is of order m°"^^^6,n > m°'/'^6m > m°'/'^~'^, which 
increases as a positive power of m. 



Appendix B: The scan statistic as the GLR 

We show that the simple scan statistic defined in (1) approximates the scan statistic of 
KuUdorff [29], which is strictly speaking the GLR, defined as follows. The log- likelihood 
under IHI™j^ is given by 

loglik(if,0,0o) := \K\{eXK - \og^{e)) + \K'\{eoXK. - log^(0o)). 
Assuming 9 and 9q are both unknown, the log GLR is defined as 



max sup loglik(X, 6, 9o) — suploglik(Vm, ^o, ^o), 



which is equal to 



max [\K\K*{Xk) + \K-\A*{Xk^) - |V™| A*(Xv,J] + . (B.l) 

(The subscript + denotes the positive part.) 

Under the normal location model, A*(a;) =2:^/2 and (B.l) is equal to 



max '7"'"^' , (Xk-Xv„)^. 



(We used the fact that Xk > Xk" <^ Xk > Xy,„ .) If := max{|Ar|: K € /Cm} satisfies 
fc+/|Vm| — )■ 0, which is the case in our examples, the fraction above is equal to |Ar|(l + 
0(fc+/|Vm|))- Moreover, knowing that there is always a cluster K such that Xk > X^^, 
we get that the square root of (B.l) is approximately equal to 

max ^/\K\{Xk - XyJ, (B.2) 
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which is the version of (1) when is unknown. (Note that Xy^ = /iq + 0(|V„i|)~"'^/^, 
by the central hmit theorem, so that (B.2) is within 0(fc+/|Vm|)^^^ from (1).) This 
approximation is actually valid more generally, at least in a way that suffices for the 
asymptotic analysis that wc perform in this work. Indeed, with ctq = Varo(X„), we 
have A*(x) = (x — /xo)^/(2crg) + 0(x' — ^q)^ in the neighborhood of ^q. Assuming that 
fc,7j := min{|iir|: K g /C,n} satisfies fc~ — > oo, which is the case in our examples, the 
approximation of the square root of (B.l) by (B.2) is valid under the null, because 
Xk = Mo + 0(^m)~^^^ and Xk^,Xy^ = Mo + 0(|V„i|)^^/^, by the central limit theorem 
and the fact that fc^ — > oo and km/\^m\ 0. The same applies under the alternative if 
0, so that fig^ :— Eg^ (Xv) —T' fiQ, and therefore, Xk for any K £ JCm. When 9m is 
bounded away from 0, the two statistics, square root of (B.l) and (B.2), are both of order 



yp^J, where K denotes the cluster under the alternative (or in the case of the ULS scan, 

the largest open cluster within the anomalous cluster). Taken together, these findings are 
sufficient to allow us to conclude that the tests based on (B.l) and (1) behave similarly. 
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